# Optimization and Big Data, Ezgi Eren, eeren@pros.com
+ Pricing nd Revenue Management
    - maximize , expected demand, limited inventory
    - car rental pricing and revenue management
    - competitor based pricing
        - how much more cheaper or expensive you are 
    - channel strategies (how they book thru orbiz or walkin)
    - fleet distribution
        - keping resource free
        - need more data to plan this


+ Problem complexity
- millions and billions of constaints and decision variables
  + product
      - location 
    - brand
    - car type
    - check out date
    - length of rental 
  + Channel (related to cust behaviour)  
    - booking channel
    - customer segment
    - booking horizon
    - price tier

+ Channel Strategy Optimization
- for each product --> assume a demand limit --. optimize acroiss channela --> increase demand limit

+ Reduced P&D Optimization
chart showing capacity constraints


+ Optimisation steps and size

----
# Esther Kundin, Bloomberg 

Outline
- problem space
    - 2 TB of data
    - avg write is 4 billion /day

2. high availability

    - HBase10070 and HDFS - region server - meta region server - client
    - Fixed above problem with backup secondary servers 
        - HDFS - 
    - 3 options 
        - HBase 12259 - hydrabase integration          
     - global zookeeper - Hbase 1 and 2 withreplication with two writers
     + Takeaways - keep multiple data centers.
     
3. high freequency
  - Fine tuning 
  - use data to set heuristics
  -  

----

# Sandy, SEAD: Infrastructure - and NSF partner
+ data in the long tail 
+ Long tail data- data sets can be small - say environmental data, bird sightings, etc
+ image data, geo gifs, etc
+ datasets are published 

Sead as Data Infrastructure
- procvide project spaces
 - staging area for publishing 
- 

----

# Bioinformatics track

## Genetic variant
- change in dna sequence
- base sequence + codons + aminoacids + protein
- diseases like cance are caused by genetic variants
- DNA sequencing for genome data analysis - short snippets that overlap, so we first reconstruct the genome via pattern mining and read alignment, so we use a refernce for gene reconstruction

#### VARIANT CALLING - problem with data
+ challenge 1: identify the true variants
+ challenge 2:process data in reasonable time - example calling a single human Chrf at 60x coverage took overnight (one night).

#### In memory DB
- IMDB's keep relevant data in main memory
- Toolbox

#### Algorithm 
+ Data preprocessing
  - extract - filter low quality data
  - group - base occurence for each position
  - parallelize - readwise
  
+ Genotype calling 
     - determine the  actual base type calling
     - derive the genotype with Bayes theorem for each genotype

 + Prior probability
  - Distinguish if the genotype is homozygous and equal to reference 
+ Genotype likelihood
  
#### Results
+ TRadiional : explicit indexing, compression, sorting
+ Upto 46x faster, implicit indexing and compression
+ Variant calling
  - both aproaches scale equally well
  - No I/O bottleneck, 15x faster than GATK

## Lauren Edwards : MIT Lincoln LAbs

----

# Workshops : Finding a Research Topic, A.J.Brush (Microsoft Research), Valerie Taylor, Texas A&M/CMD-IT, www.cra-w.org, twitter(@CRAWomen)
+ what questions do you have about finding a research topic

### Key takeaways 
+ The path to a research topic is zigzag
+ topic machange
+ ok to span two fields

### selecting  a topic
+ moving from coursework to picking a topic is often a low point
+ electives are the way to find the topic and exploring your research topic
+ passing exams, moving from coursework to research, first publication

### Thesis Equation
+ __Topic + Advisor = Dissertation__

### Advisor Vs. Research areas
OPINION: Picking a good, matching advisor is important!
+ do you have an interest in the topic - passion.
+ An advisor is for life. 
+ He/She can teach/mentor you in many things, not just research.
+ You will be less stressed.

### Selecting an adi=visor
+ Working style is important
- do some background research about the faculty research
- talk with current students about working style and the expectations.
+ be honest with yourself and get to know your working style.

### Finding a research topic
The path to success conssts of 3 simple elements: Find what interests you that you can do well, and is needed by the people.
+ your gifts, your interests and societal needs in a venn diagram is your RESEARCH topic.

### Focussingfrom Area to topic
+ Area = subfield.
 - arch, ai, hpc, or interdisciplinary
 - how important, are there jobs in that area, etc..
+ Topic = specific open problems in subfield.
 - theory: probably better algo
 - AI: improving a ML algo.
 - Arch: reliabilityand approx computing
 - Systems: reliability, big data
 - interdiscilinary: deep learning, big data
 
### Finding your Strength
- what is easier for you
  - writing and modifying 

###Topic Scale and scope
SCALE: 
- shuld be big enough to have more than one problem
SCOPE

### More things to consider
+ whose interest is important - your advisor, research community.
+ dont chase hot topics
+ love your topics : 

### Interdisciplinary research topic
+ EX: AI+systems, HCI+ Soft Engg, AI+Biology/Med

__Benefits__
- find jobs 
- be good in one atleast

### Your inerests 
+ what makes you interested? 

### 9 ways to identify the research topif
1. find social needs
2. 
     

 
 
       
